DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

نویسندگان

  • Donghyung Lee
  • T. Bernard Bigdeli
  • Vernell S. Williamson
  • Vladimir I. Vladimirov
  • Brien P. Riley
  • Ayman H. Fanous
  • Silviu-Alin Bacanu
چکیده

MOTIVATION To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. RESULTS To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. AVAILABILITY AND IMPLEMENTATION DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary Data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts

MOTIVATION To increase detection power, gene level analysis methods are used to aggregate weak signals. To greatly increase computational efficiency, most methods use as input summary statistics from genome-wide association studies (GWAS). Subsequently, gene statistics are constructed using linkage disequilibrium (LD) patterns from a relevant reference panel. However, all methods, including our...

متن کامل

DIST: direct imputation of summary statistics for unmeasured SNPs

MOTIVATION Genotype imputation methods are used to enhance the resolution of genome-wide association studies, and thus increase the detection rate for genetic signals. Although most studies report all univariate summary statistics, many of them limit the access to subject-level genotypes. Because such an access is required by all genotype imputation methods, it is helpful to develop methods tha...

متن کامل

JEPEG: a summary statistics based tool for gene-level joint testing of functional variants

MOTIVATION Gene expression is influenced by variants commonly known as expression quantitative trait loci (eQTL). On the basis of this fact, researchers proposed to use eQTL/functional information univariately for prioritizing single nucleotide polymorphisms (SNPs) signals from genome-wide association studies (GWAS). However, most genes are influenced by multiple eQTLs which, thus, jointly affe...

متن کامل

Imputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method

The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...

متن کامل

An atlas of genetic associations in UK Biobank

Genome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2015